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Abstract 

This  report  is  a  repository  for  the  results  obtained  from  a  large  scale  empirical  comparison  of  seven  itera¬ 
tive  and  evolution-based  optimization  heuristics.  Twenty-seven  static  optimization  problems,  spanning 
six  sets  of  problem  classes  which  are  commonly  explored  in  genetic  algorithm  literature,  are  examined. 
The  problem  sets  include  job-shop  scheduling,  traveling  salesman,  knapsack,  binpacking,  neural  network 
weight  emtimization,  and  standard  numerical  optimization.  The  search  spaces  in  these  problems  range 
from  2^^^to  2^^"^^.  The  results  indicate  that  using  genetic  algorithms  for  the  optimization  of  static  functions 
does  not  yield  a  benefit,  in  terms  of  the  final  answer  obtained,  over  simpler  optimization  heuristics. 
Descriptions  of  the  algorithms  tested  and  the  encodings  of  the  problems  are  described  in  detail  for  repro¬ 
ducibility. 
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1.  INTRODUCTION 

Genetic  algorithms  (GAs)  and  other  evolutionary  procedures  are  commonly  used  for  static  function  opti¬ 
mization.  Although  there  has  been  growing  evidence  that  methods  such  as  GAs  are,  in  general,  not  well 
suited  in  this  domain  [De  Jong,  1992],  a  large  amoimt  of  research  has  been  devoted  to  improving  their 
effectiveness  for  function  optimization.  Hybrid  mechanisms,  ranging  from  alternate  evolutionary  meth¬ 
ods  to  specialized  operators  and  representations  which  can  intelligently  use  problem  specific  informa¬ 
tion,  have  achieved  good  results  in  many  specific  applications.  Nonetheless,  relatively  few  of  these 
techniques  work  well  across  a  wide  range  of  problems. 

The  aim  of  this  paper  is  to  compare  two  standard  genetic  algorithms  with  simpler  methods  of  optimiza¬ 
tion:  multiple-restart  stochastic  hillclimbing  (MRSH)  and  population-based  incremental  learning  (PBIL). 
Previous  comparisons  between  forms  of  MRSH  and  GAs  can  be  found  in  [Ackley,  1994],  [Juels  &  Watten- 
berg,  1994],  [Forrest  &  Mitchell,  1992],  [MitcheU  &  Holland,  1994],  and  [Davis,  1991],  to  name  a  few.  A 
comparison  between  GAs  and  PBIL  has  been  made  in  [Baluja,  1994][Baluja  &  Caruana,  1995].  This  paper 
provides  a  large  scale  empirical  comparison  of  these  algorithms  on  problems  commonly  found  in  GA  lit¬ 
erature.  Three  variants  of  MRSH,  two  variants  of  PBIL,  and  two  GAs  are  compared. 


1.1  The  Aims  of  this  Paper 

This  study  aims  at  answering  only  one  question:  "How  effective  are  standard  GAs  for  optimizing  static 
functions,  given  a  set  number  of  function  evaluations,  in  comparison  to  other,  simpler,  algorithms?"  This 
paper  presents  results  on  many  large  problems;  the  size  and  quantity  of  the  problems  makes  it  hard  to 
give  in-depth  analysis  of  the  results  beyond  the  algorithms'  relative  performances.  A  more  in-depth  anal¬ 
ysis  of  PBIL  in  comparison  to  standard  GAs  on  a  problem  which  was  specifically  designed  to  be  easy  for 
the  genetic  algorithm  (and  easier  to  analyze  than  the  problems  explored  here)  is  provided  in  [Baluja  & 
Caruana,  1995].  This  paper  does  not  attempt  to  address  the  problem  of  whether  the  classes  of  problems 
investigated  are  suited  for  evolutionary  or  iterative  fimction  optimization.  The  focus  of  this  paper  is  on 
comparing  seven  static  function  optimization  methods  on  problems  which  are  representative  of  problems 
commonly  used  as  benchmarks  in  GA  literature.  No  problem  specific  features  have  been  added  to  any  of 
the  algorithms;  all  of  the  mechanisms  used  in  the  algorithms  are  "standard",  and  have  been  explored  and 
described  in  the  applicable  literature.  The  inclusion  of  problem  specific  mechanisms  or  more  sophisti¬ 
cated  features  has  the  potential  to  improve  the  performance  of  all  the  algorithms. 

There  are  two  major  concerns  with  performing  a  purely  empirical  comparison  of  these  algorithms.  The 
first  is  that  each  of  these  algorithms  is  defined  by  control  parameters,  and  it  is  prohibitively  expensive,  in 
practice,  to  thoroughly  explore  the  space  of  the  parameters  while  providing  breadth  in  the  types  and 
sizes  of  problems  attempted.  The  GA  parameters  used  here  were  chosen  to  work  well  on  many  of  the 
problems,  but  are  not  biased  to  any  particular  single  problem.  The  parameters  for  the  other  algorithms 
were  chosen  in  the  same  manner.  In  addition,  GAs  were  selected  to  perform  well  on  the  task  of  optimiza¬ 
tion;  they  use  mechanisms,  such  as  elitist  selection  and  scaling  of  fitness  values,  which  are  often  used  for 
the  optimization  of  static  functions  [De  Jong,  1992].  One  of  the  goals  of  this  study  is  to  use  the  algorithms  with 
as  little  problem-specific  knowledge  as  possible.  The  only  problem-specific  knowledge  used  in  these  algo¬ 
rithms  is  the  number  of  bits  in  the  solution  encoding  for  each  of  the  problems. 

The  second  concern  is  that  there  are  many  criteria  by  which  the  effectiveness  of  each  algorithm  can  be 
measured.  As  mentioned  before,  there  has  recently  been  some  controversy  in  the  GA  community  as  to 
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whether  GAs  should  be  used  for  static  function  optimization.  One  of  the  reasons  for  this  controversy  is 
that  GAs  "attempt  to  maximize  the  cumulative  payoff  of  a  sequence  of  trials"  [De  Jong,  1992]  rather  than 
attempt  to  find  the  single  best  optimum.  Therefore,  using  the  "best  answer  found"  criteria  may  not  be  the 
best  way  to  measure  the  GA's  abilities.  Nonetheless,  a  considerable  amount  of  effort  has  been  devoted  to 
making  the  GAs  better  in  function  optimization.  "Better"  has  usually  been  measured  in  terms  of  the  best 
solution  foimd  in  a  given  number  of  trials.  The  common  forms  of  measurement  for  function  optimization 
are  on-line  and  off-line  performance.  On-line  performance  measures  the  average  of  all  function  evalua¬ 
tions  up  to  and  including  the  current  evaluations.  Off-line  performance  is  a  running  average  of  the  best 
performance  values  to  a  particular  time.  Other  measurements  include  the  best  solution  found  in  the  final 
generation  and  the  best  solution  foimd  in  any  generation  through  the  search.  Although  all  these  mea¬ 
sures  reveal  different  insights  into  the  search  algorithm's  ability,  the  measure  we  are  interested  in  this 
study  is  the  best  solution  ever  found  through  the  search.  The  issues  of  cumulative  payoff,  on-line  and  off¬ 
line  performance  are  not  addressed  here.  The  effectiveness  of  each  algorithm  is  based  solely  upon  the  best 
answer  it  can  find  in  the  given  number  of  trials. 

It  is  important  to  understand  the  scope  of  these  results.  All  of  the  empirical  comparisons  are  based  upon 
static  function  optimization  problems.  The  performance  of  each  method  is  judged  solely  by  the  best  solu¬ 
tion  foimd  during  the  run,  given  a  pre-specified  number  of  total  evaluations.  Therefore,  the  following 
classes  of  problems  are  not  considered  here,  and  should  be  explored  in  the  future: 


•  Noise  in  the  evaluation  function  [Grefenstette  &  Fitzpatrick,  1988]. 

•  A  changing,  or  time-varying,  evaluation  function  (over  the  period  of  a  single  run)  [Cobb,  1993]. 

•  Problems  in  which  queries  have  an  associated  cost,  which  must  also  be  minimized  [Cohn, 
1994]. 

•  Problems  in  which  multiple  "solution  vectors"  must  interact  [Langton,  1994]. 

•  Problems  in  which  cumulative  payoff  is  to  be  optimized  [Holland,  1975],  [Golberg,  1989]. 

•  Problems  which  use  variable-length  encodings,  or  encodings  with  change  over  time  [Koza, 
1992]. 


Although  the  above  domains  are  not  addressed  here,  the  domain  which  is  concentrated  upon  covers  a 
wide  variety  of  problems.  A  large  portion  of  GA  research  has  been  devoted  to  the  types  of  problems  ana¬ 
lyzed  in  this  paper.  The  field  of  Operations  Research  is  another  source  of  many  similar  problems. 

The  next  section  describes  the  simplest  algorithm  tested,  multiple-restart  stochastic  hiUclimbing.  This  sec¬ 
tion  is  followed  by  descriptions  of  genetic  algorithms,  in  section  3,  and  population-based  incremental 
learning  in  section  4.  In  section  5,  the  problems  attempted  and  the  results  obtained  are  described  together. 
Section  6  summarizes  the  empirical  results.  Section  7  concludes  the  report  and  suggests  some  areas  for 
future  studies. 
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2.  MULTIPLE-RESTART  STOCHASTIC  HILLCLIMBING 

Multiple-restart  stochastic  hillclimbing  (MRSH)  is  a  method  of  iterative  optimization  of  static  functions.  It 
is  the  simplest  of  the  optimization  procedures  explored  in  this  paper.  [Wattenberg  and  Juels,  1994]  have 
compared  one  version  of  stochastic  hillclimbing  with  GAs  on  several  problems  commonly  used  for  gaug¬ 
ing  genetic  algorithms  and  genetic  programming,  and  have  achieved  very  promising  results.  The  basic 
stochastic  hillclimbing  algorithm  is  shown  in  Figure  1. 


V  <—  randomly  generate  solution  vector 
Best  <-  evaluate  (V) 

loop#  ITERATIONS 

N  ^  Flip_Random_Bit  (V) 
if  (evaluate  (N)  >  Best) 

Best  i-  evaluate(N) 

V^N 

Flip_Random_Bit  is  a  function  which  returns  a  solution  string  with  only  one  bit  changed  from  its  input  solution  string. 

Figure  1:  The  stochastic  hillclimbing  algorithm  for  binary  solution  vectors.  In  the  full  algorithm, 
the  best  vector  along  with  its  evaluation  would  be  saved.  In  practice  the  algorithm  could  be 
restarted  in  random  locations  many  times  -  and  the  best  solution  ever  found  returned. 


Three  variants  of  this  algorithm  are  explored  in  this  paper.  The  first  variant,  (MRSH-1)  maintains  a  list  of 
the  position  of  the  bit  flips  which  were  attempted  without  improvement.  These  bit  flips  are  not  attempted 
again  imtil  a  better  solution  is  found.  When  a  better  solution  is  found,  the  list  is  emptied.  If  the  list 
becomes  as  large  as  the  solution  encoding,  then  no  single  bit  flip  can  improve  the  solution.  In  this  case, 
MRSH-1  is  restarted  at  a  random  location  with  an  empty  list. 

The  second  and  third  variants  of  stochastic  hUlclimbing,  (MRSH-2  &  MRSH-3),  allow  moves  to  regions  of 
higher  and  equal  evaluation.  This  is  different  than  MRSH-1,  which  only  allows  moves  to  regions  of  higher 
evaluation.  MRSH-2  &  3  differ  from  each  other  in  the  number  of  evaluations  allowed  before  restarting 
search  in  a  random  location.  In  MRSH-2,  the  number  of  evaluations  is  dependent  upon  the  length  of  the 
encoded  solution.  MRSH-2  allows  10*(length  of  solution)  evaluations  without  improvement  before  search  is 
restarted.  When  a  solution  with  a  higher  evaluation  is  foimd,  the  count  is  reset.  MRSH-3  enforces  a  much 
stricter  policy  of  restart;  after  the  total  number  of  iterations  is  specified,  restart  is  forced  5  times  during 
search,  at  equally  spaced  intervals. 


3.  GENETIC  ALGORITHMS 

Genetic  algorithms  (GAs)  are  biologically  motivated  adaptive  systems  which  are  based  upon  the  princi¬ 
ples  of  natural  selection  and  genetic  recombination.  A  GA  combines  the  principles  of  survival  of  the  fit¬ 
test  with  a  randomized  information  exchange.  It  has  the  ability  to  recognize  trends  toward  optimal 
solutions,  and  to  exploit  such  information  by  guiding  the  search  toward  them. 

In  the  standard  GA,  candidate  solutions  are  encoded  as  fixed  length  vectors.  The  initial  group  of  potential 
solutions  is  chosen  randomly.  These  candidate  solutions,  called  "chromosomes,"  are  allowed  to  evolve 
over  a  number  of  generations.  At  each  generation,  the  fitness  of  each  chromosome  is  calculated;  this  is  a 
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measure  of  how  well  the  chromosome  optimizes  the  objective  function.  The  subsequent  generation  is  cre¬ 
ated  through  a  process  of  selection,  recombination,  and  mutation.  The  chromosomes  are  probabilistically 
selected  for  recombination  based  upon  their  fitness.  General  recombination  (crossover)  operators  merge 
the  information  contained  within  pairs  of  selected  "parents"  by  placing  random  subsets  of  the  informa¬ 
tion  from  both  parents  into  their  respective  positions  in  a  member  of  the  subsequent  generation. 
Although  the  chromosomes  with  high  fitness  values  have  a  higher  probability  of  selection  for  recombina¬ 
tion  than  those  with  low  fitness  values,  they  are  not  guaranteed  to  appear  in  the  next  generation.  Due  to 
the  random  factors  involved  in  producing  "children"  chromosomes,  the  children  may,  or  may  not,  have 
higher  fitness  values  than  their  parents.  Nevertheless,  because  of  the  selective  pressure  applied  through  a 
number  of  generations,  the  overall  trend  is  towards  higher  fitness  chromosomes.  Mutations  are  used  to 
help  preserve  diversity  in  the  population.  Mutations  introduce  random  changes  into  the  chromosomes.  A 
good  overview  of  GAs  can  be  found  in  [Goldberg,  1989]  [De  Jong,  1975]. 

Two  variants  of  the  traditional  genetic  algorithm  are  tested  in  this  study.  The  first,  SGA,  has  the  following 
parameters:  Two-Point  crossover,  with  a  Crossover  Rate  of  100%,  Mutation  Rate:  0.001,  Population  Size: 
100,  Elitist  selection  (the  best  chromosome  in  generation  N  replaces  the  worst  chromosome  in  generation 
N+1).  The  second  GA  used,  termed  GA-Scale,  uses  the  same  parameters,  with  the  following  exceptions: 
Uniform  crossover  with  a  crossover  rate  of  80%,  and  the  fitness  of  the  worst  member  in  a  generation  is 
subtracted  from  the  fitnesses  of  each  member  of  the  generation  before  the  probabilities  of  selection  are 
determined.  Both  GAs  are  generational,  and  both  employ  the  elitist  selection  mechanism  described 
above. 


4.  POPULATION-BASED  INCREMENTAL  LEARNING 

Population-based  incremental  learning  (PBIL)  is  a  combination  of  evolutionary  optimization  and  hill¬ 
climbing  [Baluja,  1994].  The  object  of  the  algorithm  is  to  create  a  real  valued  probability  vector  which, 
when  sampled,  reveals  high  quality  solution  vectors  with  high  probability.  For  example;  if  a  good  solu¬ 
tion  to  a  problem  can  be  encoded  as  a  string  of  alternating  O's  and  I's,  a  suitable  final  probability  vector 
would  be  0.01, 0.99, 0.01, 0.99,  etc. 

Initially,  the  values  of  the  probability  vector  are  set  to  0.5.  Sampling  from  this  vector  yields  random  solu¬ 
tion  vectors  because  the  probability  of  generating  a  1  or  0  is  equal.  As  search  progresses,  the  values  in  the 
probability  vector  gradually  shift  to  represent  high  evaluation  solution  vectors.  This  is  accomplished  as 
follows:  A  number  of  solution  vectors  are  generated  based  upon  the  probabilities  specified  in  the  proba¬ 
bility  vector.  The  probability  vector  is  pushed  towards  the  generated  solution  vector(s)  with  the  highest 
evaluation.  The  distance  the  probability  vector  is  pushed  depends  upon  the  learning  rate  parameter. 
After  the  probability  vector  is  updated,  a  new  set  of  solution  vectors  is  produced  by  sampling  from  the 
updated  probability  vector,  and  the  cycle  is  continued.  As  the  search  progresses,  entries  in  the  probability 
vector  move  away  from  their  initial  settings  of  0.5  towards  either  0.0  or  1.0.  The  probability  vector  can  be 
viewed  as  a  prototype  vector  for  generating  solution  vectors  which  have  high  evaluations  with  respect  to 
the  available  knowledge  of  the  search  space. 

This  algorithm  is  an  extension  of  the  Equilibrium  Genetic  Algorithm  developed  in  conjxmction  with 
[Juels,  1993,  1994].  Another  algorithm  related  to  EGA/PBIL  is  Bit-Based  Simulated  Crossover  (BSC) 
[Syswerda,  1992][Eshelman  &  Schaffer,  1993],  BSC  regenerates  the  probability  vector  at  each  generation; 
it  also  uses  selection  probabilities  (as  do  standard  GAs)  to  generate  the  probability  vector.  In  contrast, 
PBIL  does  not  regenerate  the  probability  vector  at  each  generation,  rather,  the  probability  vector  is 
updated  through  the  search  procedure.  Additionally,  PBIL  does  not  use  selection  probabilities.  Instead,  it 
updates  the  probability  vector  using  a  few  (in  these  experiments  1)  of  the  best  performing  individuals. 
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The  maimer  in  which  the  updates  to  the  probability  vector  occur  is  similar  to  the  weight  update  rule  in 
supervised  competitive  learning  networks,  or  the  update  rules  used  in  Learning  Vector  Quantization 
(LVQ)  [Hertz,  Krogh  &  Palmer,  1993].  Many  of  the  heuristics  used  to  make  learning  more  effective  in 
supervised  competitive  learning  networks  (or  LVQ),  or  to  increase  the  speed  of  learning,  can  be  used  with 
the  PBIL  algorithm.  This  relationship  is  discussed  in  greater  detail  in  [Baluja,  1994]. 

4.1  PBIL's  Relation  to  Genetic  Algorithms 

One  key  feature  of  the  early  portions  of  genetic  optimization  is  the  parallelism  in  the  search;  many  diverse 
points  are  represented  in  the  population  of  early  generations.  As  the  search  progresses,  the  population  of 
the  GA  tends  to  converge  around  a  good  solution  vector  in  the  fimction  space  (the  respective  bit  positions 
in  the  majority  of  the  solution  strings  converge  to  the  same  value).  PBIL  attempts  to  create  a  probability 
vector  that  is  a  prototype  for  high  evaluation  vectors  for  the  fimction  space  being  explored.  As  search 
progresses  in  PBIL,  the  values  in  the  probability  vector  move  away  from  0.5,  towards  either  0.0  or  1.0. 
Analogously  to  genetic  search,  PBIL  converges  from  initial  diversity  to  a  single  point  where  the  probabil¬ 
ities  are  close  to  either  0.0  or  1.0.  At  this  point,  there  is  a  high  degree  of  similarity  in  the  vectors  generated. 

Because  PBIL  uses  a  single  probability  vector,  it  may  seem  to  have  less  expressive  power  than  a  GA  using 
a  full  population  that  can  represent  a  large  number  of  points  simultaneously.  For  example,  in  Figure  2,  the 
vector  representations  for  populations  #1  and  #2  are  &e  same  although  the  members  of  the  two  popula¬ 
tions  are  quite  different.  This  appears  to  be  a  fundamental  limitation  of  PBIL;  a  GA  would  not  treat  these 
two  populations  the  same.  A  traditional  single  population  GA,  however,  would  not  be  able  to  maintain 
either  of  these  populations.  Because  of  sampling  errors,  the  population  will  converge  to  one  point;  it  will 
not  be  able  to  maintain  multiple  dissimilar  points.  This  phenomenon  is  summarized  below: 


"...  the  theorem  [Fundamental  Theorem  of  Genetic  Algorithms  [Goldberg,  1989]], 
assumes  an  infinitely  large  population  size.  In  a  finite  size  population,  even  when  there  is 
no  selective  advantage  for  either  of  two  competing  alternatives...  the  population  will 
converge  to  one  alternative  or  the  other  in  finite  time  (De  Jong,  1975;  [Goldberg  &  Seg- 
rest,  1987]).  This  problem  of  finite  populations  is  so  important  that  geneticists  have  given 
it  a  special  name,  genetic  drift.  Stochastic  errors  tend  to  accumulate,  ultimately  causing 
the  population  to  converge  to  one  alternative  or  another"  [Goldberg  &  Richardson,  1987]. 


Similarly,  PBIL  will  converge  to  a  probability  vector  that  represents  one  of  the  two  solutions  in  each  of  the 
populations  in  Figure  2;  the  probability  vector  can  ordy  represent  one  of  the  dissimilar  points. 


In  addition  to  moving  the  prototype  vector  towards  the  highest  evaluation  vector,  the  prototype  vector 
can  also  be  moved  away  from  the  lowest  evaluation  vector  generated  in  each  generation.  However,  as  the 
prototype  vector  becomes  fixed  towards  either  0.0  or  1.0  for  each  bit  position,  the  hamming  distance 
between  the  best  and  worst  generated  vectors  will  diminish.  If  the  hamming  distance  between  the  best 
and  worst  vector  is  small,  moving  away  from  the  worst  vector  is  counter-productive,  because  it  also 
moves  away  from  the  best  vector  in  many  of  the  bit  positions.  Instead,  the  probability  vector  can  be 
moved  away  from  the  values  in  the  worst  vector  which  differ  from  those  in  the  respective  positions  of  the 
best  vector.  The  full  algorithm  is  shown  in  Figure  3. 

In  this  study,  two  variants  of  the  algorithm  shown  in  Figure  3  are  used.  The  first,  PBIL,  uses  the  following 
parameters:  Mutation  Probability:  0.02,  Mutation  Shift:  0.05,  Learning  Rate:  0.1,  and  Negative  Learning 
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Population  #1 
0011 
1100 
1100 
0011 

Representation 
0.5, 0.5, 0.5, 0.5 


Population  #2 
1010 
0101 
1010 
0101 

Representation 
0.5  0.5  0.5  0.5 


Figure  2:  The  probability  representation  of  2  small  populations  of  4-bit  solution 
vectors;  population  size  is  4.  Notice  that  the  representations  for  both  populations 
are  the  same,  although  the  solution  vectors  represented  are  entirely  different. 

Rate:  0.075.  The  second  algorithm,  the  PBIL/EGA  algorithm,  uses  the  same  parameters  with  the  Negative 
Learning  Rate  set  to  0.0. 


5.  AN  EMPIRICAL  COMPARISON 

In  this  section,  the  algorithms  described  previously  are  applied  to  six  classes  of  problems:  Traveling  Sales¬ 
man,  jobshop  scheduling,  knapsack,  bin  packing,  neural  network  weight  optimization,  and  numerical 
function  optimization.  The  results  obtained  in  this  study  should  not  be  considered  to  be  state-of-the-art. 
The  problem  encodings  were  chosen  to  be  easily  reproducible,  and  to  allow  easy  and  fair  comparison 
with  other  studies.  Alternate  encodings  may  yield  superior  results.  In  addition,  no  problem-specific 
information  was  used  for  any  of  the  algorithms.  In  the  cases  in  which  problem-specific  information  is 
available,  it  may  be  able  to  help  all  of  the  search  algorithms  presented  in  this  study. 

In  the  problems  presented  in  this  paper,  all  of  the  variables  were  encoded  either  with  Gray-code  or  stan¬ 
dard  base-2  representation,  as  indicated  with  the  problem.  The  variables  were  represented  in  non-over¬ 
lapping,  contiguous  positions  within  the  chromosome  (solution  encoding).  The  results  reported  are  the 
best  evaluations  foimd  through  the  search  of  each  algorithm,  averaged  over  at  least  20  independent  nms 
per  algorithm  per  problem.  In  the  problems  in  which  random  values  are  assigned  to  problem  attributes 
(such  as  the  location  of  cities  in  the  Traveling  Salesman  Problems  or  sizes  of  elements  in  the  bin  packing 
and  knapsack  problems),  the  values  are  consistent  across  all  algorithms  attempted  and  across  all  20  trials 
for  each  algorithm. 

All  algorithms  were  allowed  an  equal  number  of  evaluations  per  run  (200,000).  In  each  run,  the  GA  and 
PBIL  algorithms  both  were  allowed  2000  generations,  with  100  function  evaluations  per  generation.  In 
each  run,  the  MRSH  algorithms  were  restarted  in  random  locations  as  many  times  as  needed  until 
200,000  evaluations  were  performed.  The  best  answer  ever  found  in  the  200,000  evaluations  was  returned 
as  the  best  answer  found  in  the  run.  The  final  results  for  the  problems  are  given  in  tables  following  the 
description  of  the  problems.  The  best  results  are  highlighted. 

5.1  Traveling  Salesman  Problems  (TSP) 

The  TSP  problem  is  probably  the  most  famous  of  the  NP-complete  problems.  Given  N  cities,  the  object  is 
to  find  a  minimum  length  tour  which  visits  each  city  exactly  once.  The  encoding  used  in  this  study 
requires  a  bit  string  of  size  Nlog2N  bits.  Each  city  is  assigned  a  substring  of  length  log2N  which  is  inter¬ 
preted  as  an  integer.  The  city  with  the  lowest  integer  value  comes  first  in  the  tour,  the  city  with  the  second 
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******  Initialize  Probability  Vector  ***** 
for  i  :=1  to  LENGTH  do  P[i]  =  0.5; 

while  (NOT  termination  condition) 

*****  Generate  Samples  ***** 
for  i  :=1  to  SAMPLES  do 

sample_vectors[i]  :=  generate_sample_vector_according_to__probabilitles  (P); 
evaluatlons[i]  :=Evaluate_Solution  (sample [i]); 

best_vector  :=  find_vector_with_best_evaluation  (sample_vectors,  evaluations); 
worst_vector  :=  find_vector_with_worst_evaluation  (sample_vectors,  evaluations); 

*****  Update  Probability  Vector  towards  best  solution  ***** 
for  i  :=1  to  LENGTH  do 

P[i]  :=  P[i]  *  (1.0  -  LR)  +  best_vector[i]  *  (LR); 


*****  Update  Probability  Away  from  Worst  Solution  ***** 
fori:=1  to  LENGTH  do 

If  (best_vector[i]  ^  worst„vector[i])  then 

P[i]  :=  P[l]  *  (1 .0  -  NEGATIVE_LR)  +  best_vector[l]  *  (NEGATIVE.LR); 

*****  Mutate  Probability  Vector  ***** 
for  I  :=1  to  LENGTH  do 

if  (random  (0,1)  <  MUT.PROBABILITY)  then 

if  (random  (0,1)  >  0.5)  then  mutate_dlrection  :=  1 
else  mutate_direction  :=  0; 

P[i]  :=  P[i]  *  (1 .0  -  MUT.SHIFT)  +  mutate_direction  *  (MUT_SHIFT); 


USER  DEFINED  CONSTANTS  (Values  Used  in  this  Study): 

SAMPLES:  the  number  of  vectors  generated  before  update  of  the  probability  vector  (100). 

LR:  the  learning  rate,  how  fast  to  exploit  the  search  performed  (0.1). 

NEGATIVE_LR:  the  negative  learning  rate,  how  much  to  learn  from  negative  examples  (PBIL  =  0.075,  EGA  =  0.0). 
LENGTH:  the  number  of  bits  in  a  generated  vector  (problem  specific). 

MUT_PROBABILITY:  the  probability  for  a  mutation  occurring  in  each  position  (0.02). 

MUT„SHiFT:  the  amount  a  mutation  alters  the  value  in  the  bit  position  (0.05). 

Figure  3:  The  PBIL/EGA  algorithm  for  a  binary  alphabet. 


lowest  comes  second,  etc.  In  the  case  of  ties,  the  city  whose  substring  comes  first  in  the  bit  string  comes 
first  in  the  tour.  This  encoding  was  taken  from  [Syswerda,  1992].  To  minimize  the  tour  length,  the  evalua¬ 
tion  used  is  1.0/Tour_Length.  Four  problems  were  attempted:  the  first  contained  128  cities,  the  second 
contained  200  cities,  and  the  third  and  fourth  contained  255  cities.  The  integer  encoding  of  the  fourth 
problem  used  Gray-code,  while  the  rest  used  standard  binary  code.  The  results  for  these  four  problems 
are  shown  in  Table  1.  The  distances  between  cities  were  generated  randomly  for  each  problem. 
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Table  I:  Traveling  Salesman  Problem  -  Average  Final  Tour  Length 


PROBLEM 

MRSHl 

MRSH2 

MRSH2 

EGA 

PBIL 

SGA 

GA- 

Scale 

TSP 128  (binary) 

2516.3 

2122.7 

2144.1 

1980.2 

1718.2 

3256.4 

2275.8 

TSP  200  (binary) 

7815.5 

7707.2 

7642.3 

6993.0 

6097,5 

12012.1 

7966.8 

TSP  255  (binary) 

5002.5 

4201.6 

4381.2 

4587.1 

4545.4 

8756.5 

5598.2 

TSP  255  (Gray-Code) 

5054.3 

3558  7 

4051.0 

4587.1 

4484.3 

8644.8 

5550.5 

5.2  Jobshop  Scheduling  Problems 

Recently,  genetic  algorithms  have  been  applied  to  the  jobshop  scheduling  problem  because  it  is  difficult 
for  conventional  search  based  methods  to  find  near-optimal  solutions  in  a  reasonable  amoimt  of  time 
[Fang  et  aL,  1993],  A  good  description  of  the  jobshop  problem  is  given  by  Fang: 


"In  the  general  job  shop  problem,  there  are  j  jobs  and  m  machines;  each  job  comprises  a 
set  of  tasks  which  must  each  be  done  on  a  different  machine  for  different  specified  pro¬ 
cessing  times,  in  a  given  job-dependent  order.  ...  A  legal  schedule  is  a  schedule  of  job 
sequences  on  each  machine  such  that  each  job's  task  order  is  preserved,  a  machine  is  not 
processing  two  different  jobs  at  once,  and  different  tasks  of  the  same  job  are  not  simulta¬ 
neously  being  processed  on  different  machines.  The  problem  is  to  minimize  the  total 
elapsed  time  between  the  beginning  of  the  first  task  and  the  completion  of  the  last  task 
(the  makespan)"  [Fang  et  ah,  1993]. 

The  problem  is  encoded  in  two  ways.  The  first  encoding  is  derived  from  [Fang  et.  al,  1993].  The  exact 
encoding  can  be  found  in  [Fang  et  al.^  1993]  and  [Baluja,  1994].  The  difference  between  this  encoding  and 
that  used  by  Fang  is  that  in  this  study,  bit  strings  were  used  to  encode  the  integers  (in  standard  binary 
encoding)  in  the  range  of  1..J.  Fang  used  chunks  which  are  atomic  for  the  GA.  Although  the  encoding 
used  here  makes  the  problem  difficult  for  these  optimization  techniques,  it  is  used  to  provide  results 
which  are  comparable  to  other  algorithms.  As  the  makespan  is  to  be  minimized,  the  evaluation  of  the 
potential  solution  is  (1.0/makespan).  Two  standard  test  problems  are  attempted,  a  10  job,  10  machine 
problem  and  a  20-job,  5-machine  problem.  A  description  of  the  problems  can  be  foxmd  in  [Muth  & 
Thompson,  1963].  The  results  are  shown  in  Table  11. 


Table  II:  Jobshop  Scheduling  -  Minimum  Makespan  -  Using  First  Encoding. 


PROBLEM 

MRSHl 

MRSH2 

MRSH3 

EGA 

PBIL 

SGA 

GA- 

Scale 

Jobshop  10x10 

1033.0 

1003.4 

1003.9 

984.3 

979.4 

1002.0 

1001.2 

Jobshop  20x5 

1295.3 

1284.6 

1282.9 

1204.8 

1200.5 

1256.3 

1213.9 

The  second  encoding  is  very  similar  to  the  encoding  used  in  the  Traveling  Salesman  Problem.  The  draw¬ 
back  of  this  encoding  is  that  it  uses  more  bits  than  the  previous  one.  Nonetheless,  empirically,  it  revealed 
improved  results.  Each  job  is  assigned  M  entries  of  size  log2(J'^M)  bits.  The  total  length  of  the  encoding  is 
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J*M*log2(J'^M).  Therefore,  each  of  these  problems  is  encoded  in  a  700  bit  solution  vector.  The  value  of  each 
entry  (of  length  log2(J'^M))  determines  the  order  in  which  the  jobs  are  scheduled.  The  job  which  contains 
the  smallest  valued  entry  is  scheduled  first,  etc.  The  order  in  which  the  machines  are  selected  for  each  job 
depends  upon  the  ordering  required  by  the  problem  specification.  The  results  are  shown  in  Table  III.  With 
this  encoding,  six  out  of  the  seven  algorithms  perform  better  than,  or  at  least  as  well  as,  the  first  jobshop 
encoding  presented  (the  performance  of  MRSH-1  does  not  improve  with  this  encoding). 


Table  III:  Jobshop  Scheduling  -  Minimum  Makespan  -  Using  Second  Encoding 


PROBLEM 

MRSHl 

MRSH2 

MRSH3 

EGA 

PBIL 

SGA 

GA- 

Scale 

Jobshop  10x10 

1059.0 

968.2 

970.2 

963.4 

960.6 

967.1 

961.4 

Jobshop  20x5 

1360.6 

1217.1 

1215.5 

1186.2 

1182,0 

1230.4 

1190.3 

Jobshop  20x5 
(Randomly  generated) 

1164.5 

1031.2 

1025.5 

995.0 

992.1 

1044.2 

1005.1 

5.3  Knapsack  Problem 

In  the  knapsack  problem,  there  is  a  single  bin  of  limited  capacity,  and  M  elements  of  varying  sizes  and 
values.  The  problem  is  to  select  the  elements  which  will  yield  the  greatest  summed  value  without  exceed¬ 
ing  the  capacity  of  the  bin.  The  evaluation  of  the  quality  of  the  solution  is  judged  in  two  ways:  If  the  solu¬ 
tion  selects  too  many  elements,  such  that  the  summed  size  of  the  elements  is  too  large,  the  solution  is 
judged  by  how  much  it  exceeds  the  capacity  of  the  bin  -  the  less  it  exceeds  the  capacity,  the  better  the  solu¬ 
tion.  If  the  sum  of  the  element  sizes  is  within  the  capacity  of  the  bin,  the  sum  of  the  values  of  the  selected 
elements  is  used  as  the  evaluation.  To  ensure  that  the  solutions  which  overfill  the  bin  are  not  competitive 
with  those  which  do  not,  their  evaluations  are  multiplied  by  a  small  constant.  This  makes  the  invalid 
solutions  competitive  only  when  there  are  no  solutions  in  the  population  which  are  valid.  The  evalua¬ 
tions  are  described  below. 


10  size 

^  allElements 


size 

selectedElements 


^  value 
selectedElements 


if  size  is  greater  than  capacity  of  bin 


if  size  is  less  than  or 
equal  to  capacity  of  bin 


The  weights  and  values  for  each  problem  were  randomly  generated.  In  the  first  two  problems,  having  512 
and  2000  elements  respectively,  a  unique  element  is  represented  by  each  bit.  When  a  bit  is  set  to  1,  the  cor¬ 
responding  element  is  included.  In  the  third  and  fourth  problems,  there  are  100  and  120  unique  elements, 
respectively.  However,  there  are  8  and  32  copies  of  each  element;  the  number  of  elements  of  each  type 
which  are  included  in  the  solution  is  determined  by  interpreting  a  bit  string,  length  3  (log28)  bits  and  5 
(log232)  bits,  into  decimal,  respectively.  The  results  are  given  in  Table  IV.  Note  that  the  SGA  algorithm 
was  unable  to  find  valid  solutions  in  the  second  and  fourth  problems. 
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Table  IV:  Knapsack  Problem  -  Average  of  Best  Values 


PROBLEM 

MRSHl 

MRSH2 

MRSH3 

EGA 

PBIL 

SGA 

GA- 

Scale 

Knap  (512  elem.,  1  cpy) 

27.1 

24.1 

26.7 

79.8 

-  80.1 

62.2 

72.3 

Knap  (2000  elem.,  1  cpy) 

41.3 

39.0 

38.8 

,1|1,5  . . 

130.8 

0.0 

135.2 

Knap  (100  elem.,  8  cpy) 

148.4 

136.3 

101.1 

394.9 

403.7 

49.3 

414.2 

Knap(120  elem.,  32  cpy) 

848.7 

801.1 

669.0 

2856.4 

2920.4 

0.0 

2559.1 

5.4  Bin  Packing 

In  the  bin  packing  problem,  there  are  N  bins  of  varying  capacities  and  M  elements  of  varying  sizes.  The 
problem  is  to  pack  the  bins  with  elements  as  tightly  as  possible,  without  exceeding  the  maximum  capac¬ 
ity  of  any  bin.  In  the  problems  attempted  here,  the  error  is  measured  by: 

N 

ERRORS  Y\CAF.- ASSIGNED  I  CAP, the  capacity  of  bin  i 

^  ^  ^  ASSIGNED j  is  the  total  size  of  the  elements  in  bin 

The  solution  is  encoded  in  a  bit  string  of  length  M  *  log2N.  Each  element  to  be  packed  is  assigned  a 
sequential  substrmg  of  length  log2N  whose  value  indicates  the  bin  in  which  the  element  is  placed.  In 
order  to  minimize  the  ERROR,  the  evaluation  of  the  potential  solution  is  l.O/ERROR. 

Four  bin  packing  problems  of  various  sizes  were  tested:  32  bins,  128  elements;  16  bins,  128  elements;  4 
bins,  256  elements;  and  2  bins,  512  elements.  All  of  the  problems  generated  were  guaranteed  to  have  a 
solution  with  0.0  error.  The  results  are  shown  below,  in  Table  V. 


Table  V:  Bin  Packing  Problems  -  Minimum  Error 


PROBLEM 

MRSHl 

MRSH2 

MRSH3 

EGA 

PBIL 

SGA 

GA-Scale 

Bin(32  bins, 128  elem.) 

0.62 

0.58 

0.62 

0.49 

0.42 

0.54 

0.51 

Bin(16  bins,128  elem.) 

4,4  X  10‘2 

6.8  X  10"^ 

3.8  X  10'2 

3.0  X  10-2 

2.2  X 10-2 

2.8  X 10'^ 

3.1  X  10'^ 

Bin(4  bins,  256  elem.) 

1.2  xlO"^ 

1.9  xlO'® 

6.0  X  lO'^ 

7.0  X 10-® 

6.9  X 10'® 

7.8  X  10'® 

5.0x10'^ 

Bin  (2  bins,  512  elem.) 

2.4  X 10'^ 

2.4  X  lO'® 

3.5  X 10'® 

4.0x10'® 

:  1.8  X  leg; 

1.2  X 10'^ 

4.8  x  10'^ 

5.5  Evolving  Weights  for  an  Artificial  Neural  Network  (ANN) 

Recently,  evolutionary  algorithms  have  been  used  to  evolve  the  weights  of  artificial  neural  networks.  In 
the  experiments  reported  here,  the  weights  of  two  small  predefined  network  architectures  were  evolved. 
In  the  first  test,  the  object  of  the  neural  network  was  to  identify  the  parity  of  7  inputs.  The  inputs  were 
either  0  (represented  by  -0.5)  or  1  (represented  by  0.5).  If  the  parity  was  1,  the  target  output  is  0.5;  if  the 
parity  was  0,  the  target  output  is  -0.5.  The  evaluation  was  the  sum  of  squares  error  on  the  128  training 
examples.  A  bias  input  (a  unit  whose  input  is  set  to  1.0)  was  also  used;  this  has  connections  to  the  hidden 
and  the  output  units  [Hertz,  Krogh  &  Palmer,  1993].  The  network  architecture  consisted  of  8  input  units 
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(including  bias),  5  hidden  units,  and  1  output  unit.  The  network  was  fully  connected  between  sequential 
layers.  There  were  a  total  of  46  connections  in  the  network,  the  values  of  the  weights  were  restricted  to  the 
range  of  -10.0  to  +10.0.  All  hidden  and  output  units  used  a  sigmoid  activation  function.  Weights  were  rep¬ 
resented  as  binary  and  gray  code,  and  were  assigned  8  non-overlapping  bits  in  the  solution  string. 


In  the  second  two  tests,  eight  real  valued  inputs  were  used.  The  first  two  inputs  represent  the  coordinates 
of  a  point  within  a  square  with  upper  left  corner  (ULC)  of  (-1.0, 1.0)  and  lower  right  corner  (LRC)  of  (1.0, 
-1.0).  The  task  was  to  determine  whether  the  point  fell  into  a  square  region  between  ULC(-0. 75, 0,75),  and 
LRC  (0.75,-0.75)  and  outside  a  smaller  square  with  ULC  (-0.35,  0,35),  and  LRC  (0.35,-0.35).  A  diagram  of 
this  is  shown  in  Figure  5.  5  inputs  contained  random  noise  in  the  region  [-!:+!].  This  noise  was  deter¬ 
mined  in  the  beginning  of  the  run,  and  remained  the  same,  in  each  training  example,  throughout  the  run. 
The  last  input  was  a  bias  unit.  In  total,  the  network  had  8  inputs  (including  bias),  5  hidden  units,  and  1 
output;  this  created  46  connections.  For  training,  100  imiformly  distributed  examples  were  used.  The 
same  representation  and  scaling  of  weights  was  used  as  in  the  previous  problem.  In  these  two  problems, 
weights  were  represented  as  binary  and  Gray  code,  respectively.  The  results  are  shown  in  Table  VI. 


Figure  5:  Training  Examples  for  the 
ANN  Square  Problem. 
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Table  VI:  ANN  Weight  Optimization  -  Sum  of  Squares  Error. 


PROBLEM 

MRSHl 

MRSH2 

MRSH3 

EGA 

PBIL 

SGA 

GA- 

Scale 

ANN  PARITY  7(binary) 

16.1 

16.1 

23.3 

12.8 

11.2 

15.0 

13.3 

ANN  PARITY  7(Gray) 

13.7 

15.1 

16.4 

8.3 

8.2 

11.6 

12.3 

ANN  SQUARE  (binary) 

14.1 

14.5 

17.5 

11.6 

10.9 

12.7 

13.0 

ANN  SQUARE  (Gray) 

8.4 

7.9 

9.3 

;;:;6.7 . . 

8.3 

8,9 

9.3 

5.6  Numerical  Function  Optimization 

In  this  section,  the  seven  algorithms  are  compared  on  three  numerical  optimization  problems.  In  the  first 
and  second  problems,  the  variables  in  the  first  portions  of  the  solution  string  have  a  large  influence  on  the 
quality  of  the  rest  of  the  solution;  small  changes  in  their  values  can  cause  large  changes  in  the  evaluation 
of  the  solution.  In  the  third  problem,  each  variable  can  be  set  independently.  Each  variable,  x^,  was  repre¬ 
sented  using  9  bits,  and  was  scaled  uniformly  into  the  range  ±2.56.  To  avoid  a  division  by  zero  error,  a 
small  constant,  C  (=0.00001),  was  added  to  the  denominator  of  each  function.  Each  problem  was  tested 
with  the  variables  represented  in  standard  binary  and  Gray  code.  Results  are  shown  in  Table  VII.  The 
maximization  functions  are: 


y.  =  +  ,,  /  =  2...  100 


100 

C  + 

/  =  2 

-2.56  <jc.  <2.56,  /  =  1...100 

i 


y.=^.+sqy._iy-  =  2. 


.100 


c  + 

100 

i  =  2 

h- 


1.0 

100 

C+  X  |(  (0-024 x(i+l))-;c  j| 


PROBLEM 

MRSHl 

MRSH2 

MRSH3 

EGA 

PBIL 

SGA 

GA- 

Scale 

FI  (Binary)  (xlOO) 

1.04 

1.01 

0.97 

1.93 

2.12 

1.96 

1.72 

FI  (Gray  Code)(xl00) 

1.21 

1.18 

1.17 

2.06 

2.62 

1.92 

1.78 

F2  (Binary)  (xlOO) 

3.08 

3.06 

2.91 

4.00 

4.40 

3.58 

3.68 

F2  (Gray  Code)  (xlOO) 

4.34 

4,38 

4.28 

4.67 

5.61 

3,64 

4.63 

F3  (Binary)  (x  100) 

8.07 

8.10 

7.56 

14.57 

16.43 

9.171 

12.30 

F3  (Gray  Code)  (xlOO) 

416.64 

416.64 

416.64 

331.69 

366.77 

28.35 

210.37 
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6.  SUMMARY  OF  EMPIRICAL  RESULTS 

Many  results  have  been  presented  in  the  previous  section.  This  paper  has  concentrated  on  breadth;  a 
large  number  of  problems  were  attempted  with  seven  optimization  heuristics.  It  should  be  noted  that 
because  all  of  the  algorithms  have  tunable  parameters,  it  is  possible  that  different  settings  may  yield  dif¬ 
ferent  results.  Additional  mechanisms,  which  take  advantage  of  problem  specific  information,  may  also 
improve  the  performance  of  each  of  these  methods.  Nonetheless,  by  selecting  a  variety  of  problems  and 
problem  sizes  to  compare,  all  of  the  algorithms  should  show  their  strengths  and  weaknesses  in  some  por¬ 
tion  of  the  test  set. 

The  relative  ranks  of  the  algorithms  on  all  of  the  problems  are  shown  in  Table  VII;  this  table  ranks  the 
algorithms  with  respect  to  the  average  best  results  produced  over  all  runs.  It  should  be  noted  that  with 
only  20  runs  per  algorithm,  not  all  of  the  differences  are  statistically  significant.  More  details  on  the  differ¬ 
entials  between  each  algorithm's  performance  were  presented  in  Section  5.  In  terms  of  the  final  solutions 
found,  the  PBIL  algorithm  worked  the  best  overall,  followed  by  the  EGA  algorithm.  In  the  majority  of 
problems  attempted  here  (25  out  of  27),  learning  from  negative  examples  improved  the  quality  of  the 
final  solutions  found  (PBIL  performed  better  than  EGA).  Only  in  two  of  the  problems  did  the  negative 
learning  hurt  the  performance  of  the  PBIL  algorithm  (EGA  performed  better  than  PBIL). 

In  terms  of  clock  speed,  the  MRSH  algorithms  worked  the  fastest.  However,  if  the  time  for  each  chromo¬ 
some/  solution  string's  evaluation  is  much  larger  than  the  time  for  the  algorithm's  procedures,  this  bene¬ 
fit  diminishes.  Moves  to  equal  regions  (rather  than  only  strictly  better  regions)  had  mixed  results  overall. 
Nonetheless,  in  several  problems,  such  as  the  jobshop  (both  encodings)  and  TSP  problems,  the  moves  to 
equal  regions  improved  performance.  In  other  problems  sets,  not  explored  here,  it  was  also  found  that 
moves  to  regions  of  equal  evaluations  were  important  for  good  performance  [Juels  &  Wattenberg,  1994]. 
MRSH  did  well  on  the  largest  TSP  examined;  it  was  able  to  find  a  shorter  tour  than  the  other  algorithms, 
(when  encoded  in  binary  and  Gray  Code).  Similarly,  in  F3,  MRSH  was  able  to  take  the  largest  advantage 
of  the  gray  code. 

Although  the  standard  genetic  algorithm  (SGA)  performed  only  as  well  as  the  MRSH  algorithms,  the* 
GA-scale  algorithm  performed  slightly  better.  A  summary  of  the  results  can  be  found  in  Table  VIII.  This 
table  has  the  following  columns: 


1.  In  the  cases  in  which  PBIL  did  better  than  GA-Scale,  this  column  gives  the  generation  in  which 
PBIL  was  able,  on  average,  to  surpass  the  highest  evaluation  GA-Scale  found,  on  average,  in  its 
20  runs.  For  example,  in  the  first  problem:  TSP-128  (binary),  the  highest  evaluation  of  the  GA 
was  2275.8  (Table  I),  by  generation  210,  PBIL  was  able  to  surpass  this  evaluation. 

2.  The  same  numbers  are  given  for  GA~Scale.  For  example,  on  the  knapsackOl  (2000  elem,  1  copy) 
problem,  the  highest  evaluation  PBIL  was  able  to  obtain  was  403.7,  in  generation  1505  GA- 
Scale  was,  on  average,  able  to  surpass  it. 


Columns  (3),(4)  and  (5)  of  Table  VIII  compare  the  three  different  types  of  algorithms: 


3.  Marks  the  problems  on  which  any  form  of  MRSH  was  able  to  do  better  than  GA-Scale. 

4.  Marks  when  any  form  of  MRSH  did  better  than  PBIL. 
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5.  Marks  the  problems  in  which  GA-Scale  outperformed  PBIL. 


Columns  6,7  and  8  of  Table  VIII  compare  different  variations  of  each  algorithm. 


6.  Indicates  the  problems  on  which  moves  to  equal  regions  helped  the  performance  of  MRSH. 
These  are  the  problems  in  which  either  MRSH-2  or  MRSH-3  did  better  than  MRSH-1. 

7.  Marks  the  problems  in  which  learning  from  negative  examples  helped  the  PBIL/ EGA  algo¬ 
rithm.  These  are  the  problems  in  which  PBIL  performed  better  than  EGA. 

8.  Indicates  the  problems  in  which  GA-Scale  did  better  than  SGA.  The  improvement  may  be  due 
to  the  scaling  of  fitness  values  and/or  the  different  crossover  operators  (GA-Scale:  Uniform, 
SGA:  Two  Point). 

For  the  problems  which  were  attempted  with  gray  and  binary  code.  Table  IX  provides  a  list  of  which 
algorithms  benefited  from  using  gray  code. 


7.  CONCLUSIONS 


This  paper  has  presented  results  on  many  problems.  From  the  results  reported  in  this  paper,  it  is  evident 
that  algorithms  which  are  simpler  than  standard  GAs  can  perform  comparably  to  GAs,  on  both  small  and 
large  problems.  Other  studies  have  also  shown  this  for  various  sets  of  problems  [Juels  &  Wattenberg, 
1994]  [Mitchell  &  Forrest,  1992],  etc.  In  studies  analyzing  the  performance  of  GAs  on  particular  problems, 
these  results  suggest  that  analyses  should  include  comparisons  not  only  to  other  GAs,  but  also  to  other 
simpler  methods  of  optimization  before  a  benefit  is  claimed  in  favor  of  GAs.  This  study  did  not  include 
techniques  such  as  Simulated  Annealing  or  Tabu  Search,  which  should  be  included  in  the  future. 

It  is  interesting  to  note  that  the  PBIL  algorithm,  which  does  not  use  the  crossover  operator,  and  redefines 
the  role  of  the  population  to  one  which  is  very  different  than  that  of  a  GA,  performs  either  better  than  or 
comparably  to  a  GA  on  the  majority  of  the  problems.  PBIL  and  GAs  both  generate  new  trials  based  on 
statistics  from  a  population  of  prior  trials.  The  PBIL  algorithm  explicitly  maintains  these  statistics,  while 
the  GA  implicitly  maintains  them  in  its  population.  The  GA  extracts  the  statistics  by  the  selection  and 
crossover  operators.  More  detailed  comparisons  between  these  two  algorithms  can  be  found  in  [Baluja  & 
Caruana,  1995]. 

It  should  be  noted  that  the  relative  performance  of  GAs  in  comparison  to  PBIL  will  improve  as  the  popu¬ 
lation  size  of  the  GA  increases  [Baluja  &  Caruana,  1995].  As  the  population  size  of  the  GA  increases,  the 
GA  will  be  able  to  maintain  more  dissimilar  points  in  its  population,  and  will  therefore  be  able  to  use 
crossover  more  effectively.  On  the  other  hand,  in  its  current  implementation,  the  PBIL  algorithm  only  use 
a  few  solution  vectors  for  updating  the  probability  vector  regardless  of  the  population  size.  Nonetheless, 
the  large  population  size  needed  by  a  GA  is  not  always  feasible  because  of  the  need  to  balance  the  num¬ 
ber  of  generations  required  and  the  total  number  of  evaluations  possible.  However,  even  when  the 
resources  to  use  large  populations  are  available,  a  large  amount  of  empirical  work  has  shown  that  using 
an  "parallel  island-model"  GA  may  be  more  effective  than  a  single  panmictic  population.  The  island- 
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model  evolves  multiple  small  population  in  parallel,  with  only  a  small  amount  of  interaction  between 
subpopulations.  The  parallel  subpopulation  model,  in  many  cases,  has  outperformed  the  use  of  a  single 
large  population;  see  for  example:  [Davidor  et.  al,  1993][Gordon  &  Whitley,  1993][Baluja,  1993][Whitley  & 
Starkweather,  1990].  If  these  parallel  subpopulations  are  used  instead  of  a  single  large  population,  each 
subpopulation  can  be  modeled  with  individual  probability  vectors,  as  in  the  PBIL  algorithm. 

A  GA  with  different  mechanisms,  such  as  non-stationary  mutation  rates,  local  optimization  heuristics, 
parallel  subpopulations,  specialized  crossover,  or  larger  operating  alphabets,  may  perform  better  than 
the  GAs  explored  here.  It  should  be  noted,  however,  that  all  of  these  mechanisms,  with  the  exception  of 
specialized  crossover  operators,  can  be  used  with  PBIL  with  few,  if  any,  modifications. 

Another  direction  to  explore  in  the  future  is  how  these  algorithm  perform  with  alternate  solution  encod¬ 
ings;  in  this  study  only  binary  encodings  were  used.  Although  work  has  already  been  conducted  in  this 
area  with  GAs  (a  good  introduction  to  this  can  be  found  in  [Eshelman  &  Schaffer,  1992]),  how  well  will 
PBIL  or  MRSH  perform  with  these  alternate  encodings?  Finally,  in  this  study,  optimization  was  only 
explored  in  static  environments.  Future  research  should  also  include  search  and  optimization  in  dynamic 
environments,  or  environments  which  require  maximization  of  cumulative  payoff.  The  adaptive  nature 
of  GAs  may  reveal  a  pronounced  benefit  in  these  more  complex  domains. 
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Table  IX:  Algorithms  which  benefited  by  using  Gray  Code  over  standard  binary  code. 
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